arXiv: 1506.04374v 1 [q-bio.NC] 14Jun2015 


Head-related Impulse Response Cues for Spatial Auditory 

Brain-computer Interface 


Chisaki Nakaizumi 1 , Shoji Makino 1 , and Tomasz M. Rutkowski 1,2,3 ’* 


Abstract —This study provides a comprehensive test of a 
head-related impulse response (HRIR) cues for a spatial 
auditory brain-computer interface (saBCI) speller paradigm. 
We present a comparison with the conventional virtual sound 
headphone-based spatial auditory modality. We propose and 
optimize the three types of sound spatialization settings using 
a variable elevation in order to evaluate the HRIR efficacy 
for the saBCI. Three experienced and seven naive BCI users 
participated in the three experimental setups based on ten 
presented Japanese syllables. The obtained EEG auditory 
evoked potentials (AEP) resulted with encouragingly good and 
stable P300 responses in online BCI experiments. Our case 
study indicated that users could perceive elevation in the 
saBCI experiments generated using the HRIR measured from 
a general head model. The saBCI accuracy and information 
transfer rate (ITR) scores have been improved comparing 
to the classical horizontal plane-based virtual spatial sound 
reproduction modality, as far as the healthy users in the current 
pilot study are concerned. 

I. Introduction 

BCI is a technology that uses brain neuronal signals 
to operate a computer without any muscle movements. 
Therefore, it is expected to provide a speller for disabled 
people such as patients suffering from the amyotrophic 
lateral sclerosis (ALS) [1]. Although currently a successful 
visual BCI modality could provide a fast speller, the ad¬ 
vanced patients who are in a locked-in state cannot use it 
because they lose any intentional muscle control including 
even eye blinks [2]. Auditory BCI can be an alternative 
method because it does not require a good sight or eye 
movements [1], [3]-[6]. We propose an alternative method to 
extend the previously published by our group spatial auditory 
BCI (saBCI) paradigms [3], [6] by making use of a head 
related impulse response (HRIR) for the virtual spatial sound 
images reproduction with headphones. Our research target 
is the virtual sound saBCI using the HRIR-based spatial 
cues to create the non-invasive and auditory stimulus-driven 
paradigm, which does not require a long training. HRIR 
appends interaural-intensity-differences (IID), interaural- 
time-differences (ITD), and spectral modifications to create 
the spatial stimuli, while a vector-based amplitude panning 
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(VBAP) appends only the IID. The HRIR allows for more 
precise and fully spatial virtual sound images positioning 
utilizing even the user not own HRIR measurements [7]. 

In our previous study [8] we evaluated saBCI feasibility 
with the HRIR-based spatial sound generator, and compared 
it with a formerly reported vector-based-amplitude-panning 
(VBAP)-based spatial auditory experiments [6]. The above 
study used only five Japanese vowels distributed horizontally 
every 40° in front of a subject head. The pilot study resulted 
with clear P300 responses and it has shown that HRIR 
modality could improve spelling accuracy comparing to the 
conventional spatial sound generation methods. In the study 
presented in this paper we introduce ten Japanese kana 
syllables-based speller using the sound elevation features as 
the second step. We also compare various sound elevation 
settings. However, it is usually difficult to precisely perceive 
various sound elevations using not the user own HRIR, 
because the elevation perception is highly individual and it 
is influenced by the shape of an auricle [7]. Unfortunately, 
it would be very difficult to record a bedridden ALS- 
patient own HRIR. Therefore, we propose to test the HRIR- 
based saBCI efficacy using a general head model (KEMAR). 
We conduct psychophysical and EEG experiments in online 
saBCI ten Japanese kana syllables spelling experiments in 
order to test our research hypothesis of the KEMAR HRIR- 
based paradigm practical feasibility. 

From now on the paper is organized as follows: next sec¬ 
tion describes methods and experimental set up; third section 
reports results of the R300 response-based saBCI speller 
classification accuracies and information transfer rate (ITR) 
in comparison to a conventional method; finally, conclusions 
and future research directions summarize the paper. 

II. Materials and Methods 

In order to prepare the saBCI stimuli to create an oddball 
paradigm generating the P300 responses [2], all spatial sound 
images were created using HRIR of the general head model. 
Ten Japanese syllables with a vowel ”a” were selected for as 
sound stimuli in this project. The spoken syllables were taken 
from a public Japanese sound dataset of a female speech [9]. 
The monaural sound stimuli were spatially distributed using 
a public domain KEMAR’ S HRTF DATABASE provided by 
the MIT Media Lab [10]. In order to generate a stereo 
sound placed at a spatial location at an azimuth of 6 and an 
elevation of (j) the following procedure was applied. Let 
and h V) Q^ be the minimum-phase impulse responses from the 
KEMAR HRTF DATABASE measured at the chosen azimuth 
6 and the elevation 0 at the left (/) and right (r) ears. The 



Fig. 1. A diagram of the three proposed virtual spatial stimulus sound 
settings. The left box depicts the horizontal only placement-based setting; 
the middle the elevation use-based case; and the right box the elevation use 
with an additional frequency shifting set up. 

stereo spatial sound delivered via headphones to the left and 
right ears respectively could be constructed, in time domain 
using HRIR, as a two-dimensional signal composed of the 
left xi(t) and right x r (t) headphone channels as follows, 

n— 1 

Xl(t) = ^/J W (T)i(f-T), (1) 

T=0 
n— 1 

x r(t) = T h r Q d) ( z)s(t — T), 

T=0 

where t denotes sample time delay and n is the HRIR length 
as obtained from the HRTF DATABASE [11]. The so created 
spatial acoustic stimuli were delivered to the left and right 
ears of the user through the ear-fitting portable headphones 
SENNHEISER CX 400II. Three proposed spatial settings 
to evaluate the feasibility of the sound elevation using the 
HRIR for the saBCI are presented in Figure [T] The first 
spatial stimulus setting included only the horizontal sound 
images’ placement of the ten saBCI commands. The sound 
images were distributed at five directions every 45° on 
the horizontal plane at the user ears’ level. Two types of 
commands were delivered from the same spatial directions. 
The second stimulus setting included the sound elevation 
variability. Ten stitmuli were distributed among the all dif¬ 
ferent directions. Five commands were localized at elevation 
of 50° from the user ears’ level and every 45° horizontally. 
The third setting included also the variable elevation and 
an additional option of a frequency modulation appended 
to discriminate the sound images originating from various 
elevations. In the psychoacoustics there is a well-known 
property called a tonal bell causing a higher-frequency sound 
to be perceived as originating from a higher elevation [7]. 
To simulate the above effect we shifted in frequency domain 
the stimuli, which shall be perceived at higher elevations, 
using the TANDEM-STRAIGHT method [12]. The sound 
stimuli to be perceived at elevation of 50° had shifted up 
their fundamental frequencies by 24 Hz. 

All of the experiments reported in this paper were per¬ 
formed in the Life Science Center of TARA, University of 
Tsukuba, Japan. Ten healthy users participated in our study. 
They were comprised of seven naive and three experienced 
BCI users. The average age was of 24.7 years old (standard 
deviation 6.48 years old, three males and seven females). 


The psychophysical and online EEG saBCI experiments were 
conducted in accordance with The World Medical Associa¬ 
tion Declaration of Helsinki - Ethical Principles for Medical 
Research Involving Human Subjects. The experimental pro¬ 
cedures were approved and designed in agreement with the 
ethical committee guidelines of the Faculty of Engineering, 
Information and Systems at University of Tsukuba, Japan. 

The psychophysical experiments were conducted to ex¬ 
amine a perception of elevation and preferences for each 
spatial sound setting. The users were instructed to respond by 
pressing the button as soon as possible after they perceived 
the target stimulus as in the classical oddball paradigm [2]. In 
a single experimental session 10 targets and 90 non-targets 
were presented. Each experiment was comprised of three 
sessions for every spatial sound setting. The stimulus dura¬ 
tion was set to 300 ms and the inter-stimulus-interval (ISI) 
to 700 ms. The online EEG experiments were conducted 
to investigate whether P300 responses could be evoked in 
the various spatial sound settings and to compare the saBCI 
classification accuracies, as well as an efficaty of each set up. 
The brain signals were collected by a bio-signal amplifier 
system g.USBamp by g.tec Medical Engineering GmbH, 
Austria. The EEG signals were captured by sixteen active 
gel-based electrodes g.LADYbird attached to the following 
head locations Cz, Pz, P3, P4, Cp5, Cp6, PI, P2, Poz, 
Cl, C2, FC1, FC2, and FCz as in the extended 10/10 
international system [2]. The ground electrode was attached 
on the forehead at the FPz location, and the reference on 
the user’s left earlobe respectively. An in-house extended 
BCI2000 [13] software was used for the saBCI experiments 
to present stimuli and display online classification results. 

A single EEG experimental session was comprised of 
one training and two test runs for each spatial stimulus 
setting. In the training run, EEG brainwaves were recorded 
and next classifier parameters were calculated. In order to 
spell a single character, 15 targets and 135 non-targets were 
presented for ERP response averaging. In the test runs, user 
spelled two sets of the three words that the covered ten 
targets (a-ta-ma; ha-ra; sa-wa-ya-ka-na; wa-ta; ha-na- 
ya; a-ka-ra-sa-ma). For spelling single character, 5 target 
and 45 non-target stimuli were presented continuously as for 
single syllable (five ERPs averaging scenario). Experiment 
order of every spatial sound setting and spelled words were 
randomized for each subject. The sound stimulus duration 
was set to 300 ms and the inter-stimulus-interval (ISI) to 
150 ms. The EEG sampling rate was of 512 Hz and a notch 
filter to remove electric power lines interface of 50 Hz was 
applied in a rejection band of 48 ~ 52 Hz. The band-pass 
filter was set at 0.1 Hz and 60 Hz cutoff frequencies. The 
acquired EEG brain signals were classified online by the in- 
house extended BCI2000 application using a stepwise linear 
discriminant analysis (SWLDA) [14] classifier with features 
drawn from the 0 ^ 800 ms ERP interval, with removal of 
the least significant input features, having p > 0.15, and with 
the final discriminant function restricted to contain a maxi¬ 
mum of 60 features. The participating users answered also 
questionnaires, asking about which modality they preferred, 
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Fig. 2. Grand mean averaged EEG ERP responses together with standard 
error bars for all the participating users in the study plotted for representative 
electrodes separately. Purple lines depict the grand mean averaged targets 
with clear P300 responses, while blue traces are the non-targets. Eye blink 
artifacts were removed with 80 fiV thresholding. 

after finishing psychophysical and EEG experiments. 

III. Results 

In the psychophysical experiments the users could perceive 
the various elevation settings. The questionnaire answers 
indicated that the frequency shifting supported the stimulus 
discrimination. The elevation and frequency variable setting 
resulted with the best perception of the spatial sound image 
stimuli. Four out of six participants preferred elevation and 
frequency-based spatial sound setting. The results of the 
saBCI EEG experiment are depicted in Figure [2] Each 
column presents the grand mean averaged ERP results at 
representative four electrodes for the three proposed spatial 
sound settings. We confirmed the clear P300 responses in 
latency ranges of 400 ~ 1000 ms and their usability for 
the subsequent classification. Figure [3] presents the classi¬ 
fication accuracies of the P300 responses as obtained with 
the SWLDA classifier. The theoretical chance level was of 
10% in this study. All users scored with accuracies above 
70% at the best. There were three users who resulted with 
100% accuracies as the best in the reported experiments. 
The average scores were obtained as mean values calculated 
from the two saBCI test spelling sessions of all the users 
(training sessions were not included in the saBCI accuracy 
calculations). The results are shown in Figure [4] There were 
significant differences between horizontal and elevation use; 
elevation only versus elevation and frequency spatial sound 
settings as calculated with the t-test (the significant level was 
p < 0.05). The 70% of the users preferred elevation-based 
spatial sound setting. The important outcome of the presented 
study was that the users could clearly perceive elevation 
variability using not their own HRIR filters. Although the 
results of psychophysical experiments have shown that the 
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Fig. 3. The average saBCI spelling accuracies of all the users. The chance 
level was of 10%. The blue bars depict the saBCI accuracy of horizontal 
sound images placement; green bars represent elevation-based results; and 
the orange color the elevation and frequency settings, respectively. Th users 
#4, #5 and #10 were the experienced BCI users, while the remaining were 
the naive participants. 


users preferred elevation and frequency spatial sound setting, 
the online saBCI accuracies suggested that P300 responses 
generated in elevation variability setting where the best. 
In order to compare the developed HRIR-based saBCI 
we calculated the information transfer rate (ITR) scores 
considered as a major comparison measure among the BCI 
paradigms [4]. ITR scores were calculated as follows: 


ITR = VR , (2) 

R = log 2 N + P • log 2 P+(\-P)-\og 2 ^ j , 

where V was the classification speed in selections/minute; R 
represented the number of bits/selection; N was the number 
of classes (10 in this study), and P the classification accuracy. 
The conditions contributing to ITR were in a trade-off 
relationship with the task easiness. For example, the short ISI 
could improve the ITR, but it could cause the task to be dif¬ 
ficult. ITR could increase with larger number of commands, 
higher accuracies, shorter ISI, and a smaller number of the 
averaged trials. We also compared the ITR scores of the 
three proposed modalities with our previous project results 
and with the vector-based-amplitude-panning (VBAP)-based 
spatial auditory approach, which was regarded as a conven¬ 
tional method [6]. The VBAP experiment was conducted 
for two sessions and with 16 BCI-naive users [6]. The 
electrode positions were the same as in the experiments 
reported in this paper. The sound stimuli were presented with 
small ear-fitting headphones (SENNHEISER CX 400II) 
in the all the studies. The spatial locations of the stimulus 
sound images, the number of commands, and ISI settings 
are summarized in Table HI The ITR scores of the elevation 
use spatial setting reached 14.92 bit/min. The other proposed 
modalities scored above 8.5 bit/min and also exceeded our 
previous research results [8]. The headphone-based virtual 
saBCI took one step forward. Although the HRIR based 
saBCI resulted with better accuracy scores, this results cannot 
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Fig. 4. This averaged saBCI spelling accuracies for each of the tested 
spatial sound settings. The theoretical chance level was of 10%. The blue 
color depicts the accuracy of horizontal sound setting; green the elevation 
variability; and orange the elevation and frequency case, respectively. The 
right side pie chart depicts the user preferences of the three spatial sound 
settings with the same color coding, respectively. 

be compared for a statistical significance with the former 
study due to the different subject groups. We also analyzed 
spelling accuracy for the various horizontal azimuths as 
shown in Table \H\ Compared with the horizontal placement 
modality, the remaining two spatial sound settings resulted 
with reduced accuracy errors. The results have shown that 
the variable stimuli elevation helped the users to distinguish 
better the horizontal sound locations either. 

IV. Conclusions 

The presented EEG results confirmed the P300 responses 
feasibility among the experienced and naive saBCI users. 
The proposed spatial sound stimulus settings for various 
elevations obtained using HRIR were effective for saBCI 
paradigm. Additionally, the short ISI did not distract the 
users’ perception, rather it apparently sharpened it resulting 
with the better saBCI accuracies. 

The obtained ITRs resulted with better scores comparing 
to our previous study using simple HRIR and VBAP-based 
spatial sound virtualization. 

Nevertheless, current study is not yet ready to compete 
with the faster visual BCI spellers. Furthermore, it is nec¬ 
essary to improve the ITR for a comfortable online saBCI- 
based spelling. We plan to extend the proposed saBCI to 


TABLE I 

The averaged accuracies and ITRs of the proposed, our 

PREVIOUS HRIR PROJECT AND THE CONVENTIONAL YBAP METHODS 


Spatial sound mode 

Averaged 

accuracy 

ITR 

(bit/min) 

The proposed method with 
the use of various HRIRs; 

10 commands; ISI=150 ms 

Horizontal 

59.5% 

8.51 

Elevation 

78.0% 

14.92 

Elev. and freq. 

60.5% 

8.81 

Previous HRIR 

5 commands; ISI=300 ms 

Horizontal 

55.8% 

1.79 

Conventional VBAP; 

5 commands; ISI=500 ms 

Horizontal 

45.6% 

0.57 


TABLE II 

The mean spelling accuracy for various horizontal azimuths 


Sound mode 

Sound azimuth-based saBCI mean accuracies 

-90° 

-45° 

0° 

45° 

O 

o 

Horizontal 

85% 

80% 

95% 

90% 

90% 

Elevation 

90% 

95% 

95% 

90% 

95* < 

Elevation and 
frequency 

90% 

95% 

90% 

85% 

100% 


realize the full set Japanese kana characters-based speller, 
and to design a more effective spatial sound placement using 
more precise elevation settings. 
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